39 research outputs found

    Thompson Sampling for a Fatigue-aware Online Recommendation System

    Full text link
    In this paper we consider an online recommendation setting, where a platform recommends a sequence of items to its users at every time period. The users respond by selecting one of the items recommended or abandon the platform due to fatigue from seeing less useful items. Assuming a parametric stochastic model of user behavior, which captures positional effects of these items as well as the abandoning behavior of users, the platform's goal is to recommend sequences of items that are competitive to the single best sequence of items in hindsight, without knowing the true user model a priori. Naively applying a stochastic bandit algorithm in this setting leads to an exponential dependence on the number of items. We propose a new Thompson sampling based algorithm with expected regret that is polynomial in the number of items in this combinatorial setting, and performs extremely well in practice

    An Online Algorithm for Learning Buyer Behavior under Realistic Pricing Restrictions

    Full text link
    We propose a new efficient online algorithm to learn the parameters governing the purchasing behavior of a utility maximizing buyer, who responds to prices, in a repeated interaction setting. The key feature of our algorithm is that it can learn even non-linear buyer utility while working with arbitrary price constraints that the seller may impose. This overcomes a major shortcoming of previous approaches, which use unrealistic prices to learn these parameters making them unsuitable in practice

    Block-Structure Based Time-Series Models For Graph Sequences

    Full text link
    Although the computational and statistical trade-off for modeling single graphs, for instance, using block models is relatively well understood, extending such results to sequences of graphs has proven to be difficult. In this work, we take a step in this direction by proposing two models for graph sequences that capture: (a) link persistence between nodes across time, and (b) community persistence of each node across time. In the first model, we assume that the latent community of each node does not change over time, and in the second model we relax this assumption suitably. For both of these proposed models, we provide statistically and computationally efficient inference algorithms, whose unique feature is that they leverage community detection methods that work on single graphs. We also provide experimental results validating the suitability of our models and methods on synthetic and real instances.Comment: 40 pages, 10 figure

    Optimizing Revenue over Data-driven Assortments

    Full text link
    We revisit the problem of large-scale assortment optimization under the multinomial logit choice model without any assumptions on the structure of the feasible assortments. Scalable real-time assortment optimization has become essential in e-commerce operations due to the need for personalization and the availability of a large variety of items. While this can be done when there are simplistic assortment choices to be made, not imposing any constraints on the collection of feasible assortments gives more flexibility to incorporate insights of store-managers and historically well-performing assortments. We design fast and flexible algorithms based on variations of binary search that find the revenue of the (approximately) optimal assortment. We speed up the comparisons steps using novel vector space embeddings, based on advances in the information retrieval literature. For an arbitrary collection of assortments, our algorithms can find a solution in time that is sub-linear in the number of assortments and for the simpler case of cardinality constraints - linear in the number of items (existing methods are quadratic or worse). Empirical validations using the Billion Prices dataset and several retail transaction datasets show that our algorithms are competitive even when the number of items is ∼105\sim 10^5 (100100x larger instances than previously studied).Comment: 28 pages, 4 figure

    Faster Reinforcement Learning Using Active Simulators

    Full text link
    In this work, we propose several online methods to build a \emph{learning curriculum} from a given set of target-task-specific training tasks in order to speed up reinforcement learning (RL). These methods can decrease the total training time needed by an RL agent compared to training on the target task from scratch. Unlike traditional transfer learning, we consider creating a sequence from several training tasks in order to provide the most benefit in terms of reducing the total time to train. Our methods utilize the learning trajectory of the agent on the curriculum tasks seen so far to decide which tasks to train on next. An attractive feature of our methods is that they are weakly coupled to the choice of the RL algorithm as well as the transfer learning method. Further, when there is domain information available, our methods can incorporate such knowledge to further speed up the learning. We experimentally show that these methods can be used to obtain suitable learning curricula that speed up the overall training time on two different domains.Comment: 12 pages and 4 figures More experiments added to the previous versio

    Symmetry Learning for Function Approximation in Reinforcement Learning

    Full text link
    In this paper we explore methods to exploit symmetries for ensuring sample efficiency in reinforcement learning (RL), this problem deserves ever increasing attention with the recent advances in the use of deep networks for complex RL tasks which require large amount of training data. We introduce a novel method to detect symmetries using reward trails observed during episodic experience and prove its completeness. We also provide a framework to incorporate the discovered symmetries for functional approximation. Finally we show that the use of potential based reward shaping is especially effective for our symmetry exploitation mechanism. Experiments on various classical problems show that our method improves the learning performance significantly by utilizing symmetry information.Comment: 12 pages, 3 figures. A preliminary version appears in AAMAS 2017. Also presented at the 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Makin

    Generalization Bounds for Learning with Linear, Polygonal, Quadratic and Conic Side Knowledge

    Full text link
    In this paper, we consider a supervised learning setting where side knowledge is provided about the labels of unlabeled examples. The side knowledge has the effect of reducing the hypothesis space, leading to tighter generalization bounds, and thus possibly better generalization. We consider several types of side knowledge, the first leading to linear and polygonal constraints on the hypothesis space, the second leading to quadratic constraints, and the last leading to conic constraints. We show how different types of domain knowledge can lead directly to these kinds of side knowledge. We prove bounds on complexity measures of the hypothesis space for quadratic and conic side knowledge, and show that these bounds are tight in a specific sense for the quadratic case.Comment: 37 pages, 3 figures, a shorter version appeared in ISAIM 2014 (new additions include a reference change and a new figure

    The Costs and Benefits of Sharing: Sequential Individual Rationality and Sequential Fairness

    Full text link
    In designing dynamic shared service systems that incentivize customers to opt for shared rather than exclusive service, the traditional notion of individual rationality may be insufficient, as a customer's estimated utility could fluctuate arbitrarily during their time in the shared system, as long as their realized utility at service completion is not worse than that for exclusive service. In this work, within a model that explicitly considers the "inconvenience costs" incurred by customers due to sharing, we introduce the notion of sequential individual rationality (SIR) that requires that the disutility of existing customers is nonincreasing as the system state changes due to new customer arrivals. Next, under SIR, we observe that cost sharing can also be viewed as benefit sharing, which inspires a natural definition of sequential fairness (SF) - the total incremental benefit due to a new customer is shared among existing customers in proportion to the incremental inconvenience suffered. We demonstrate the effectiveness of these notions by applying them to a ridesharing system, where unexpected detours to pick up subsequent passengers inconvenience the existing passengers. Imposing SIR and SF reveals interesting and surprising results, including: (a) natural limits on the incremental detours permissible, (b) exact characterization of "SIR-feasible" routes, which boast sublinear upper and lower bounds on the fractional detours, (c) exact characterization of sequentially fair cost sharing schemes, which includes a strong requirement that passengers must compensate each other for the detour inconveniences that they cause, and (d) new algorithmic problems related to and motivated by SIR.Comment: Presented as a poster at EC 2016. Presented as an invited talk (sponsored session) at INFORMS Annual Meeting 2016. Presented at MSOM Service Operations SIG 2017. Currently under review at Management Scienc

    Learning to Partition using Score Based Compatibilities

    Full text link
    We study the problem of learning to partition users into groups, where one must learn the compatibilities between the users to achieve optimal groupings. We define four natural objectives that optimize for average and worst case compatibilities and propose new algorithms for adaptively learning optimal groupings. When we do not impose any structure on the compatibilities, we show that the group formation objectives considered are NPNP hard to solve and we either give approximation guarantees or prove inapproximability results. We then introduce an elegant structure, namely that of \textit{intrinsic scores}, that makes many of these problems polynomial time solvable. We explicitly characterize the optimal groupings under this structure and show that the optimal solutions are related to \emph{homophilous} and \emph{heterophilous} partitions, well-studied in the psychology literature. For one of the four objectives, we show NPNP hardness under the score structure and give a 12\frac{1}{2} approximation algorithm for which no constant approximation was known thus far. Finally, under the score structure, we propose an online low sample complexity PAC algorithm for learning the optimal partition. We demonstrate the efficacy of the proposed algorithm on synthetic and real world datasets.Comment: Appears in the Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017

    Privacy-preserving Targeted Advertising

    Full text link
    Recommendation systems form the center piece of a rapidly growing trillion dollar online advertisement industry. Even with numerous optimizations and approximations, collaborative filtering (CF) based approaches require real-time computations involving very large vectors. Curating and storing such related profile information vectors on web portals seriously breaches the user's privacy. Modifying such systems to achieve private recommendations further requires communication of long encrypted vectors, making the whole process inefficient. We present a more efficient recommendation system alternative, in which user profiles are maintained entirely on their device, and appropriate recommendations are fetched from web portals in an efficient privacy preserving manner. We base this approach on association rules.Comment: A preliminary version was presented at the 11th INFORMS Workshop on Data Mining and Decision Analytics (2016
    corecore